Dataset statistics
| Number of variables | 17 |
|---|---|
| Number of observations | 160112 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 20.8 MiB |
| Average record size in memory | 136.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 8 |
RTPR has constant value "0" | Constant |
CMD has a high cardinality: 406 distinct values | High cardinality |
ts is highly correlated with Status and 2 other fields | High correlation |
PID is highly correlated with type | High correlation |
TRUN is highly correlated with State and 1 other fields | High correlation |
TSLPI is highly correlated with PRI | High correlation |
NICE is highly correlated with PRI | High correlation |
PRI is highly correlated with POLI and 3 other fields | High correlation |
CPUNR is highly correlated with RTPR | High correlation |
EXC is highly correlated with RTPR | High correlation |
CPU is highly correlated with TRUN | High correlation |
TSLPU is highly correlated with State | High correlation |
POLI is highly correlated with PRI and 2 other fields | High correlation |
RTPR is highly correlated with State and 5 other fields | High correlation |
Status is highly correlated with ts and 5 other fields | High correlation |
State is highly correlated with TRUN and 4 other fields | High correlation |
label is highly correlated with ts and 2 other fields | High correlation |
type is highly correlated with ts and 3 other fields | High correlation |
EXC is highly skewed (γ1 = 101.0338824) | Skewed |
TRUN has 147066 (91.9%) zeros | Zeros |
TSLPI has 13983 (8.7%) zeros | Zeros |
NICE has 117993 (73.7%) zeros | Zeros |
PRI has 4315 (2.7%) zeros | Zeros |
CPUNR has 42811 (26.7%) zeros | Zeros |
EXC has 159514 (99.6%) zeros | Zeros |
CPU has 116771 (72.9%) zeros | Zeros |
Reproduction
| Analysis started | 2022-11-12 02:32:13.118327 |
|---|---|
| Analysis finished | 2022-11-12 02:33:17.648247 |
| Duration | 1 minute and 4.53 seconds |
| Software version | pandas-profiling v3.4.0 |
| Download configuration | config.json |
| Distinct | 141852 |
|---|---|
| Distinct (%) | 88.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1555145386 |
| Minimum | 1554218915 |
|---|---|
| Maximum | 1556549129 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 1554218915 |
|---|---|
| 5-th percentile | 1554258943 |
| Q1 | 1554419054 |
| median | 1554619192 |
| Q3 | 1556222938 |
| 95-th percentile | 1556365714 |
| Maximum | 1556549129 |
| Range | 2330214 |
| Interquartile range (IQR) | 1803884.25 |
Descriptive statistics
| Standard deviation | 881681.2642 |
|---|---|
| Coefficient of variation (CV) | 0.0005669445905 |
| Kurtosis | -1.686601961 |
| Mean | 1555145386 |
| Median Absolute Deviation (MAD) | 300560 |
| Skewness | 0.4755534352 |
| Sum | 2.489974381 × 1014 |
| Variance | 7.773618516 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1556208193 | 3 | < 0.1% |
| 1556226673 | 3 | < 0.1% |
| 1556226748 | 3 | < 0.1% |
| 1556226743 | 3 | < 0.1% |
| 1556226738 | 3 | < 0.1% |
| 1556226733 | 3 | < 0.1% |
| 1556226728 | 3 | < 0.1% |
| 1556226718 | 3 | < 0.1% |
| 1556226713 | 3 | < 0.1% |
| 1556226708 | 3 | < 0.1% |
| Other values (141842) | 160082 |
| Value | Count | Frequency (%) |
| 1554218915 | 1 | |
| 1554218920 | 1 | |
| 1554218925 | 1 | |
| 1554218930 | 1 | |
| 1554218935 | 1 | |
| 1554218940 | 1 | |
| 1554218945 | 1 | |
| 1554218950 | 1 | |
| 1554218955 | 1 | |
| 1554218960 | 1 |
| Value | Count | Frequency (%) |
| 1556549129 | 2 | |
| 1556548639 | 2 | |
| 1556548364 | 2 | |
| 1556547914 | 2 | |
| 1556547464 | 2 | |
| 1556547359 | 2 | |
| 1556547354 | 2 | |
| 1556547344 | 2 | |
| 1556547264 | 2 | |
| 1556547224 | 2 |
| Distinct | 4011 |
|---|---|
| Distinct (%) | 2.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3256.037293 |
| Minimum | 1007 |
|---|---|
| Maximum | 53075 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 1007 |
|---|---|
| 5-th percentile | 1371 |
| Q1 | 2533 |
| median | 3058 |
| Q3 | 3793 |
| 95-th percentile | 5007 |
| Maximum | 53075 |
| Range | 52068 |
| Interquartile range (IQR) | 1260 |
Descriptive statistics
| Standard deviation | 2071.338868 |
|---|---|
| Coefficient of variation (CV) | 0.6361532998 |
| Kurtosis | 384.5555235 |
| Mean | 3256.037293 |
| Median Absolute Deviation (MAD) | 694 |
| Skewness | 16.25900556 |
| Sum | 521330643 |
| Variance | 4290444.707 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3790 | 4183 | 2.6% |
| 3793 | 4165 | 2.6% |
| 3675 | 3979 | 2.5% |
| 3678 | 3965 | 2.5% |
| 3677 | 3964 | 2.5% |
| 3676 | 3961 | 2.5% |
| 1442 | 3960 | 2.5% |
| 2797 | 3959 | 2.5% |
| 2774 | 3959 | 2.5% |
| 1371 | 3954 | 2.5% |
| Other values (4001) | 120063 |
| Value | Count | Frequency (%) |
| 1007 | 158 | |
| 1019 | 1 | < 0.1% |
| 1026 | 136 | |
| 1063 | 40 | < 0.1% |
| 1087 | 12 | < 0.1% |
| 1103 | 37 | < 0.1% |
| 1108 | 1 | < 0.1% |
| 1113 | 1 | < 0.1% |
| 1124 | 121 | |
| 1133 | 59 | < 0.1% |
| Value | Count | Frequency (%) |
| 53075 | 1 | |
| 53074 | 1 | |
| 53064 | 1 | |
| 53060 | 1 | |
| 53051 | 1 | |
| 53046 | 1 | |
| 53045 | 2 | |
| 53044 | 1 | |
| 53043 | 2 | |
| 53042 | 1 |
| Distinct | 10 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.08309183571 |
| Minimum | 0 |
|---|---|
| Maximum | 9 |
| Zeros | 147066 |
| Zeros (%) | 91.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 1 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.2843146755 |
|---|---|
| Coefficient of variation (CV) | 3.421692072 |
| Kurtosis | 24.3423872 |
| Mean | 0.08309183571 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.724719218 |
| Sum | 13304 |
| Variance | 0.08083483469 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) |
| 0 | 147066 | |
| 1 | 12839 | 8.0% |
| 2 | 183 | 0.1% |
| 3 | 13 | < 0.1% |
| 4 | 5 | < 0.1% |
| 5 | 2 | < 0.1% |
| 7 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 147066 | |
| 1 | 12839 | 8.0% |
| 2 | 183 | 0.1% |
| 3 | 13 | < 0.1% |
| 4 | 5 | < 0.1% |
| 5 | 2 | < 0.1% |
| 6 | 1 | < 0.1% |
| 7 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 9 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 7 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 5 | 2 | < 0.1% |
| 4 | 5 | < 0.1% |
| 3 | 13 | < 0.1% |
| 2 | 183 | 0.1% |
| 1 | 12839 | 8.0% |
| 0 | 147066 |
| Distinct | 46 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.91681448 |
| Minimum | 0 |
|---|---|
| Maximum | 67 |
| Zeros | 13983 |
| Zeros (%) | 8.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 4 |
| 95-th percentile | 13 |
| Maximum | 67 |
| Range | 67 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 7.856579839 |
|---|---|
| Coefficient of variation (CV) | 2.005859578 |
| Kurtosis | 31.00150788 |
| Mean | 3.91681448 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 5.153660518 |
| Sum | 627129 |
| Variance | 61.72584677 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=46)
| Value | Count | Frequency (%) |
| 1 | 74593 | |
| 3 | 21720 | 13.6% |
| 4 | 15844 | 9.9% |
| 0 | 13983 | 8.7% |
| 5 | 8249 | 5.2% |
| 2 | 6010 | 3.8% |
| 11 | 5642 | 3.5% |
| 9 | 3691 | 2.3% |
| 17 | 2074 | 1.3% |
| 27 | 1363 | 0.9% |
| Other values (36) | 6943 | 4.3% |
| Value | Count | Frequency (%) |
| 0 | 13983 | 8.7% |
| 1 | 74593 | |
| 2 | 6010 | 3.8% |
| 3 | 21720 | 13.6% |
| 4 | 15844 | 9.9% |
| 5 | 8249 | 5.2% |
| 6 | 403 | 0.3% |
| 7 | 904 | 0.6% |
| 8 | 401 | 0.3% |
| 9 | 3691 | 2.3% |
| Value | Count | Frequency (%) |
| 67 | 6 | < 0.1% |
| 66 | 5 | < 0.1% |
| 65 | 1 | < 0.1% |
| 64 | 11 | < 0.1% |
| 63 | 48 | < 0.1% |
| 62 | 39 | < 0.1% |
| 61 | 294 | 0.2% |
| 60 | 909 | |
| 59 | 312 | 0.2% |
| 58 | 138 | 0.1% |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| 0 | |
|---|---|
| 1 | 139 |
| 2 | 2 |
| 21 | 1 |
| 17 | 1 |
Length
| Max length | 2 |
|---|---|
| Median length | 1 |
| Mean length | 1.000012491 |
| Min length | 1 |
Characters and Unicode
| Total characters | 160114 |
|---|---|
| Distinct characters | 4 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 2 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 159969 | |
| 1 | 139 | 0.1% |
| 2 | 2 | < 0.1% |
| 21 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 159969 | |
| 1 | 139 | 0.1% |
| 2 | 2 | < 0.1% |
| 21 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 159969 | |
| 1 | 141 | 0.1% |
| 2 | 3 | < 0.1% |
| 7 | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 160114 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 159969 | |
| 1 | 141 | 0.1% |
| 2 | 3 | < 0.1% |
| 7 | 1 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 160114 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 159969 | |
| 1 | 141 | 0.1% |
| 2 | 3 | < 0.1% |
| 7 | 1 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 160114 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 159969 | |
| 1 | 141 | 0.1% |
| 2 | 3 | < 0.1% |
| 7 | 1 | < 0.1% |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| norm | |
|---|---|
| 0 | 2910 |
| - | 1405 |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 3.919150345 |
| Min length | 1 |
Characters and Unicode
| Total characters | 627503 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | norm |
|---|---|
| 2nd row | norm |
| 3rd row | norm |
| 4th row | norm |
| 5th row | norm |
Common Values
| Value | Count | Frequency (%) |
| norm | 155797 | |
| 0 | 2910 | 1.8% |
| - | 1405 | 0.9% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| norm | 155797 | |
| 0 | 2910 | 1.8% |
| 1405 | 0.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 155797 | |
| o | 155797 | |
| r | 155797 | |
| m | 155797 | |
| 0 | 2910 | 0.5% |
| - | 1405 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 623188 | |
| Decimal Number | 2910 | 0.5% |
| Dash Punctuation | 1405 | 0.2% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 155797 | |
| o | 155797 | |
| r | 155797 | |
| m | 155797 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 2910 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 1405 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 623188 | |
| Common | 4315 | 0.7% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 155797 | |
| o | 155797 | |
| r | 155797 | |
| m | 155797 |
Common
| Value | Count | Frequency (%) |
| 0 | 2910 | |
| - | 1405 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 627503 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 155797 | |
| o | 155797 | |
| r | 155797 | |
| m | 155797 | |
| 0 | 2910 | 0.5% |
| - | 1405 | 0.2% |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.15747102 |
| Minimum | 0 |
|---|---|
| Maximum | 20 |
| Zeros | 117993 |
| Zeros (%) | 73.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 20 |
| 95-th percentile | 20 |
| Maximum | 20 |
| Range | 20 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 8.722291837 |
|---|---|
| Coefficient of variation (CV) | 1.691195511 |
| Kurtosis | -0.7667436257 |
| Mean | 5.15747102 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.107347917 |
| Sum | 825773 |
| Variance | 76.0783749 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 117993 | |
| 20 | 40864 | 25.5% |
| 1 | 555 | 0.3% |
| 11 | 434 | 0.3% |
| 10 | 210 | 0.1% |
| 19 | 56 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 117993 | |
| 1 | 555 | 0.3% |
| 10 | 210 | 0.1% |
| 11 | 434 | 0.3% |
| 19 | 56 | < 0.1% |
| 20 | 40864 | 25.5% |
| Value | Count | Frequency (%) |
| 20 | 40864 | 25.5% |
| 19 | 56 | < 0.1% |
| 11 | 434 | 0.3% |
| 10 | 210 | 0.1% |
| 1 | 555 | 0.3% |
| 0 | 117993 |
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 111.6547479 |
| Minimum | 0 |
|---|---|
| Maximum | 139 |
| Zeros | 4315 |
| Zeros (%) | 2.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 100 |
| Q1 | 100 |
| median | 120 |
| Q3 | 120 |
| 95-th percentile | 120 |
| Maximum | 139 |
| Range | 139 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 20.52109038 |
|---|---|
| Coefficient of variation (CV) | 0.1837905755 |
| Kurtosis | 20.66816672 |
| Mean | 111.6547479 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -4.33798329 |
| Sum | 17877265 |
| Variance | 421.1151502 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=8)
| Value | Count | Frequency (%) |
| 120 | 113678 | |
| 100 | 40864 | 25.5% |
| 0 | 4315 | 2.7% |
| 121 | 555 | 0.3% |
| 109 | 434 | 0.3% |
| 130 | 208 | 0.1% |
| 139 | 56 | < 0.1% |
| 110 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 4315 | 2.7% |
| 100 | 40864 | 25.5% |
| 109 | 434 | 0.3% |
| 110 | 2 | < 0.1% |
| 120 | 113678 | |
| 121 | 555 | 0.3% |
| 130 | 208 | 0.1% |
| 139 | 56 | < 0.1% |
| Value | Count | Frequency (%) |
| 139 | 56 | < 0.1% |
| 130 | 208 | 0.1% |
| 121 | 555 | 0.3% |
| 120 | 113678 | |
| 110 | 2 | < 0.1% |
| 109 | 434 | 0.3% |
| 100 | 40864 | 25.5% |
| 0 | 4315 | 2.7% |
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| 0 |
|---|
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 160112 |
|---|---|
| Distinct characters | 1 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 160112 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 160112 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 160112 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 160112 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 160112 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 160112 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 160112 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 160112 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 160112 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.569351454 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 42811 |
| Zeros (%) | 26.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 3 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.245370844 |
|---|---|
| Coefficient of variation (CV) | 0.7935576449 |
| Kurtosis | -0.6543222896 |
| Mean | 1.569351454 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.2960978891 |
| Sum | 251272 |
| Variance | 1.550948539 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 42811 | |
| 2 | 41052 | |
| 3 | 36879 | |
| 1 | 33930 | |
| 5 | 2841 | 1.8% |
| 4 | 2599 | 1.6% |
| Value | Count | Frequency (%) |
| 0 | 42811 | |
| 1 | 33930 | |
| 2 | 41052 | |
| 3 | 36879 | |
| 4 | 2599 | 1.6% |
| 5 | 2841 | 1.8% |
| Value | Count | Frequency (%) |
| 5 | 2841 | 1.8% |
| 4 | 2599 | 1.6% |
| 3 | 36879 | |
| 2 | 41052 | |
| 1 | 33930 | |
| 0 | 42811 |
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| - | |
|---|---|
| 0 | |
| N | 2759 |
| NE | 2620 |
| NS | 47 |
| Other values (2) | 2 |
Length
| Max length | 2 |
|---|---|
| Median length | 1 |
| Mean length | 1.016663336 |
| Min length | 1 |
Characters and Unicode
| Total characters | 162780 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 3 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 2 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 0 |
| 3rd row | 0 |
| 4th row | 0 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| - | 125604 | |
| 0 | 29080 | 18.2% |
| N | 2759 | 1.7% |
| NE | 2620 | 1.6% |
| NS | 47 | < 0.1% |
| C | 1 | < 0.1% |
| NC | 1 | < 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 125604 | ||
| 0 | 29080 | 18.2% |
| n | 2759 | 1.7% |
| ne | 2620 | 1.6% |
| ns | 47 | < 0.1% |
| c | 1 | < 0.1% |
| nc | 1 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 125604 | |
| 0 | 29080 | 17.9% |
| N | 5427 | 3.3% |
| E | 2620 | 1.6% |
| S | 47 | < 0.1% |
| C | 2 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Dash Punctuation | 125604 | |
| Decimal Number | 29080 | 17.9% |
| Uppercase Letter | 8096 | 5.0% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| N | 5427 | |
| E | 2620 | |
| S | 47 | 0.6% |
| C | 2 | < 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 125604 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 29080 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 154684 | |
| Latin | 8096 | 5.0% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| N | 5427 | |
| E | 2620 | |
| S | 47 | 0.6% |
| C | 2 | < 0.1% |
Common
| Value | Count | Frequency (%) |
| - | 125604 | |
| 0 | 29080 | 18.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 162780 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| - | 125604 | |
| 0 | 29080 | 17.9% |
| N | 5427 | 3.3% |
| E | 2620 | 1.6% |
| S | 47 | < 0.1% |
| C | 2 | < 0.1% |
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.01766263615 |
| Minimum | 0 |
|---|---|
| Maximum | 100 |
| Zeros | 159514 |
| Zeros (%) | 99.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 100 |
| Range | 100 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.6928328088 |
|---|---|
| Coefficient of variation (CV) | 39.22590052 |
| Kurtosis | 13692.86972 |
| Mean | 0.01766263615 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 101.0338824 |
| Sum | 2828 |
| Variance | 0.4800173009 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=8)
| Value | Count | Frequency (%) |
| 0 | 159514 | |
| 1 | 441 | 0.3% |
| 15 | 107 | 0.1% |
| 9 | 19 | < 0.1% |
| 2 | 19 | < 0.1% |
| 11 | 6 | < 0.1% |
| 100 | 5 | < 0.1% |
| 7 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 0 | 159514 | |
| 1 | 441 | 0.3% |
| 2 | 19 | < 0.1% |
| 7 | 1 | < 0.1% |
| 9 | 19 | < 0.1% |
| 11 | 6 | < 0.1% |
| 15 | 107 | 0.1% |
| 100 | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| 100 | 5 | < 0.1% |
| 15 | 107 | 0.1% |
| 11 | 6 | < 0.1% |
| 9 | 19 | < 0.1% |
| 7 | 1 | < 0.1% |
| 2 | 19 | < 0.1% |
| 1 | 441 | 0.3% |
| 0 | 159514 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| S | |
|---|---|
| R | 8583 |
| E | 4315 |
| I | 1590 |
| D | 129 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 160112 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | S |
|---|---|
| 2nd row | S |
| 3rd row | S |
| 4th row | S |
| 5th row | S |
Common Values
| Value | Count | Frequency (%) |
| S | 145383 | |
| R | 8583 | 5.4% |
| E | 4315 | 2.7% |
| I | 1590 | 1.0% |
| D | 129 | 0.1% |
| Z | 112 | 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| s | 145383 | |
| r | 8583 | 5.4% |
| e | 4315 | 2.7% |
| i | 1590 | 1.0% |
| d | 129 | 0.1% |
| z | 112 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 145383 | |
| R | 8583 | 5.4% |
| E | 4315 | 2.7% |
| I | 1590 | 1.0% |
| D | 129 | 0.1% |
| Z | 112 | 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 160112 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 145383 | |
| R | 8583 | 5.4% |
| E | 4315 | 2.7% |
| I | 1590 | 1.0% |
| D | 129 | 0.1% |
| Z | 112 | 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 160112 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 145383 | |
| R | 8583 | 5.4% |
| E | 4315 | 2.7% |
| I | 1590 | 1.0% |
| D | 129 | 0.1% |
| Z | 112 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 160112 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 145383 | |
| R | 8583 | 5.4% |
| E | 4315 | 2.7% |
| I | 1590 | 1.0% |
| D | 129 | 0.1% |
| Z | 112 | 0.1% |
| Distinct | 237 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.03834053662 |
| Minimum | 0 |
|---|---|
| Maximum | 4 |
| Zeros | 116771 |
| Zeros (%) | 72.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 1.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.01 |
| 95-th percentile | 0.08 |
| Maximum | 4 |
| Range | 4 |
| Interquartile range (IQR) | 0.01 |
Descriptive statistics
| Standard deviation | 0.1884109857 |
|---|---|
| Coefficient of variation (CV) | 4.914145765 |
| Kurtosis | 65.43413661 |
| Mean | 0.03834053662 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 6.826015102 |
| Sum | 6138.78 |
| Variance | 0.03549869953 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 116771 | |
| 0.01 | 28522 | 17.8% |
| 0.02 | 4495 | 2.8% |
| 0.99 | 1867 | 1.2% |
| 0.98 | 1308 | 0.8% |
| 1 | 923 | 0.6% |
| 0.03 | 895 | 0.6% |
| 0.04 | 429 | 0.3% |
| 0.05 | 385 | 0.2% |
| 0.1 | 378 | 0.2% |
| Other values (227) | 4139 | 2.6% |
| Value | Count | Frequency (%) |
| 0 | 116771 | |
| 0.01 | 28522 | 17.8% |
| 0.02 | 4495 | 2.8% |
| 0.03 | 895 | 0.6% |
| 0.04 | 429 | 0.3% |
| 0.05 | 385 | 0.2% |
| 0.06 | 282 | 0.2% |
| 0.07 | 160 | 0.1% |
| 0.08 | 224 | 0.1% |
| 0.09 | 350 | 0.2% |
| Value | Count | Frequency (%) |
| 4 | 11 | |
| 3.85 | 1 | < 0.1% |
| 3.84 | 3 | < 0.1% |
| 3.78 | 1 | < 0.1% |
| 3.74 | 1 | < 0.1% |
| 3.69 | 1 | < 0.1% |
| 3.65 | 1 | < 0.1% |
| 3.63 | 1 | < 0.1% |
| 3.53 | 1 | < 0.1% |
| 3.52 | 1 | < 0.1% |
| Distinct | 406 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| atop | |
|---|---|
| vmtoolsd | |
| apache2 | 8347 |
| Xorg | 5873 |
| nautilus | 5665 |
| Other values (401) |
Length
| Max length | 19 |
|---|---|
| Median length | 13 |
| Mean length | 7.954944039 |
| Min length | 2 |
Characters and Unicode
| Total characters | 1273682 |
|---|---|
| Distinct characters | 61 |
| Distinct categories | 10 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 87 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | atop |
|---|---|
| 2nd row | nautilus |
| 3rd row | upstart-dbus-b |
| 4th row | drone |
| 5th row | atop |
Common Values
| Value | Count | Frequency (%) |
| atop | 40831 | |
| vmtoolsd | 9304 | 5.8% |
| apache2 | 8347 | 5.2% |
| Xorg | 5873 | 3.7% |
| nautilus | 5665 | 3.5% |
| ostinato | 5260 | 3.3% |
| compiz | 5254 | 3.3% |
| irqbalance | 5221 | 3.3% |
| hud-service | 5125 | 3.2% |
| drone | 4928 | 3.1% |
| Other values (396) | 64304 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| atop | 40850 | |
| vmtoolsd | 9304 | 5.8% |
| apache2 | 9250 | 5.7% |
| xorg | 5873 | 3.6% |
| nautilus | 5672 | 3.5% |
| compiz | 5264 | 3.3% |
| ostinato | 5260 | 3.3% |
| irqbalance | 5221 | 3.2% |
| hud-service | 5125 | 3.2% |
| drone | 4928 | 3.1% |
| Other values (347) | 64509 |
Most occurring characters
| Value | Count | Frequency (%) |
| o | 128162 | 10.1% |
| a | 115833 | 9.1% |
| t | 113244 | 8.9% |
| e | 96955 | 7.6% |
| p | 83900 | 6.6% |
| n | 67461 | 5.3% |
| s | 65793 | 5.2% |
| d | 56935 | 4.5% |
| i | 56843 | 4.5% |
| r | 52940 | 4.2% |
| Other values (51) | 435616 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 1155346 | |
| Dash Punctuation | 50279 | 3.9% |
| Decimal Number | 31753 | 2.5% |
| Uppercase Letter | 13430 | 1.1% |
| Other Punctuation | 12508 | 1.0% |
| Math Symbol | 8626 | 0.7% |
| Space Separator | 1144 | 0.1% |
| Connector Punctuation | 594 | < 0.1% |
| Open Punctuation | 1 | < 0.1% |
| Close Punctuation | 1 | < 0.1% |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| o | 128162 | |
| a | 115833 | 10.0% |
| t | 113244 | 9.8% |
| e | 96955 | 8.4% |
| p | 83900 | 7.3% |
| n | 67461 | 5.8% |
| s | 65793 | 5.7% |
| d | 56935 | 4.9% |
| i | 56843 | 4.9% |
| r | 52940 | 4.6% |
| Other values (16) | 317280 |
Uppercase Letter
| Value | Count | Frequency (%) |
| X | 5873 | |
| W | 3093 | |
| C | 2407 | |
| E | 708 | 5.3% |
| M | 703 | 5.2% |
| N | 625 | 4.7% |
| T | 6 | < 0.1% |
| G | 4 | < 0.1% |
| I | 4 | < 0.1% |
| O | 4 | < 0.1% |
| Other values (3) | 3 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 16413 | |
| 5 | 4796 | 15.1% |
| 6 | 4688 | 14.8% |
| 0 | 2540 | 8.0% |
| 1 | 2281 | 7.2% |
| 3 | 810 | 2.6% |
| 7 | 212 | 0.7% |
| 4 | 9 | < 0.1% |
| 8 | 3 | < 0.1% |
| 9 | 1 | < 0.1% |
Math Symbol
| Value | Count | Frequency (%) |
| < | 4315 | |
| > | 4308 | |
| ~ | 2 | < 0.1% |
| + | 1 | < 0.1% |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 6296 | |
| : | 6183 | |
| . | 29 | 0.2% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 50279 |
Space Separator
| Value | Count | Frequency (%) |
| 1144 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 594 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 1 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 1168776 | |
| Common | 104906 | 8.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| o | 128162 | 11.0% |
| a | 115833 | 9.9% |
| t | 113244 | 9.7% |
| e | 96955 | 8.3% |
| p | 83900 | 7.2% |
| n | 67461 | 5.8% |
| s | 65793 | 5.6% |
| d | 56935 | 4.9% |
| i | 56843 | 4.9% |
| r | 52940 | 4.5% |
| Other values (29) | 330710 |
Common
| Value | Count | Frequency (%) |
| - | 50279 | |
| 2 | 16413 | 15.6% |
| / | 6296 | 6.0% |
| : | 6183 | 5.9% |
| 5 | 4796 | 4.6% |
| 6 | 4688 | 4.5% |
| < | 4315 | 4.1% |
| > | 4308 | 4.1% |
| 0 | 2540 | 2.4% |
| 1 | 2281 | 2.2% |
| Other values (12) | 2807 | 2.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 1273682 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| o | 128162 | 10.1% |
| a | 115833 | 9.1% |
| t | 113244 | 8.9% |
| e | 96955 | 7.6% |
| p | 83900 | 6.6% |
| n | 67461 | 5.3% |
| s | 65793 | 5.2% |
| d | 56935 | 4.5% |
| i | 56843 | 4.5% |
| r | 52940 | 4.2% |
| Other values (51) | 435616 |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 160112 |
|---|---|
| Distinct characters | 2 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 100000 | |
| 1 | 60112 |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| 0 | 100000 | |
| 1 | 60112 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 100000 | |
| 1 | 60112 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 160112 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 100000 | |
| 1 | 60112 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 160112 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 100000 | |
| 1 | 60112 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 160112 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 100000 | |
| 1 | 60112 |
| Distinct | 8 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.2 MiB |
| normal | |
|---|---|
| dos | 10000 |
| ddos | 10000 |
| injection | 10000 |
| password | 10000 |
| Other values (3) |
Length
| Max length | 9 |
|---|---|
| Median length | 6 |
| Mean length | 5.936144699 |
| Min length | 3 |
Characters and Unicode
| Total characters | 950448 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | dos |
|---|---|
| 2nd row | dos |
| 3rd row | dos |
| 4th row | dos |
| 5th row | dos |
Common Values
| Value | Count | Frequency (%) |
| normal | 100000 | |
| dos | 10000 | 6.2% |
| ddos | 10000 | 6.2% |
| injection | 10000 | 6.2% |
| password | 10000 | 6.2% |
| scanning | 10000 | 6.2% |
| xss | 10000 | 6.2% |
| mitm | 112 | 0.1% |
Length
Histogram of lengths of the category
Category Frequency Plot
| Value | Count | Frequency (%) |
| normal | 100000 | |
| dos | 10000 | 6.2% |
| ddos | 10000 | 6.2% |
| injection | 10000 | 6.2% |
| password | 10000 | 6.2% |
| scanning | 10000 | 6.2% |
| xss | 10000 | 6.2% |
| mitm | 112 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 150000 | |
| o | 140000 | |
| a | 120000 | |
| r | 110000 | |
| m | 100224 | |
| l | 100000 | |
| s | 70000 | |
| d | 40000 | 4.2% |
| i | 30112 | 3.2% |
| c | 20000 | 2.1% |
| Other values (7) | 70112 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 950448 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| n | 150000 | |
| o | 140000 | |
| a | 120000 | |
| r | 110000 | |
| m | 100224 | |
| l | 100000 | |
| s | 70000 | |
| d | 40000 | 4.2% |
| i | 30112 | 3.2% |
| c | 20000 | 2.1% |
| Other values (7) | 70112 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 950448 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| n | 150000 | |
| o | 140000 | |
| a | 120000 | |
| r | 110000 | |
| m | 100224 | |
| l | 100000 | |
| s | 70000 | |
| d | 40000 | 4.2% |
| i | 30112 | 3.2% |
| c | 20000 | 2.1% |
| Other values (7) | 70112 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 950448 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| n | 150000 | |
| o | 140000 | |
| a | 120000 | |
| r | 110000 | |
| m | 100224 | |
| l | 100000 | |
| s | 70000 | |
| d | 40000 | 4.2% |
| i | 30112 | 3.2% |
| c | 20000 | 2.1% |
| Other values (7) | 70112 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| ts | PID | TRUN | TSLPI | TSLPU | POLI | NICE | PRI | RTPR | CPUNR | Status | EXC | State | CPU | CMD | label | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1556129658 | 52888 | 0 | 1 | 0 | norm | 20 | 100 | 0 | 0 | 0 | 0 | S | 0.01 | atop | 1 | dos |
| 1 | 1556129738 | 2791 | 0 | 5 | 0 | norm | 0 | 120 | 0 | 0 | 0 | 0 | S | 0.00 | nautilus | 1 | dos |
| 2 | 1556129778 | 2504 | 0 | 1 | 0 | norm | 0 | 120 | 0 | 2 | 0 | 0 | S | 0.00 | upstart-dbus-b | 1 | dos |
| 3 | 1556129788 | 3147 | 1 | 12 | 0 | norm | 0 | 120 | 0 | 1 | 0 | 0 | S | 1.00 | drone | 1 | dos |
| 4 | 1556129798 | 52888 | 0 | 1 | 0 | norm | 20 | 100 | 0 | 0 | 0 | 0 | S | 0.01 | atop | 1 | dos |
| 5 | 1556129823 | 3144 | 0 | 3 | 0 | norm | 0 | 120 | 0 | 0 | 0 | 0 | S | 0.01 | ostinato | 1 | dos |
| 6 | 1556129898 | 1424 | 0 | 1 | 0 | norm | 0 | 120 | 0 | 0 | 0 | 0 | S | 0.00 | irqbalance | 1 | dos |
| 7 | 1556129913 | 3147 | 1 | 12 | 0 | norm | 0 | 120 | 0 | 1 | 0 | 0 | S | 1.00 | drone | 1 | dos |
| 8 | 1556129923 | 3144 | 0 | 3 | 0 | norm | 0 | 120 | 0 | 1 | 0 | 0 | S | 0.01 | ostinato | 1 | dos |
| 9 | 1556129933 | 52886 | 0 | 1 | 0 | norm | 20 | 100 | 0 | 3 | 0 | 0 | S | 0.01 | atop | 1 | dos |
Last rows
| ts | PID | TRUN | TSLPI | TSLPU | POLI | NICE | PRI | RTPR | CPUNR | Status | EXC | State | CPU | CMD | label | type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 160102 | 1554250825 | 2753 | 0 | 3 | 0 | norm | 0 | 120 | 0 | 2 | - | 0 | S | 0.0 | ibus-engine-si | 0 | normal |
| 160103 | 1554250830 | 1931 | 0 | 3 | 0 | norm | 0 | 120 | 0 | 2 | - | 0 | S | 0.0 | upowerd | 0 | normal |
| 160104 | 1554250835 | 2870 | 0 | 2 | 0 | norm | 0 | 120 | 0 | 1 | - | 0 | S | 0.0 | gvfs-mtp-volum | 0 | normal |
| 160105 | 1554250840 | 2691 | 0 | 4 | 0 | norm | 0 | 120 | 0 | 3 | - | 0 | S | 0.0 | indicator-soun | 0 | normal |
| 160106 | 1554250845 | 1452 | 0 | 3 | 0 | norm | 0 | 120 | 0 | 0 | - | 0 | S | 0.0 | accounts-daemo | 0 | normal |
| 160107 | 1554250850 | 2517 | 0 | 2 | 0 | norm | 0 | 120 | 0 | 0 | - | 0 | S | 0.0 | gvfsd | 0 | normal |
| 160108 | 1554250855 | 2751 | 0 | 3 | 0 | norm | 0 | 120 | 0 | 1 | - | 0 | S | 0.0 | dconf-service | 0 | normal |
| 160109 | 1554250860 | 2583 | 0 | 4 | 0 | norm | 0 | 120 | 0 | 2 | - | 0 | S | 0.0 | ibus-dconf | 0 | normal |
| 160110 | 1554250865 | 2489 | 0 | 2 | 0 | norm | 0 | 120 | 0 | 3 | - | 0 | S | 0.0 | at-spi2-regist | 0 | normal |
| 160111 | 1554250870 | 2327 | 0 | 1 | 0 | norm | 0 | 120 | 0 | 1 | - | 0 | S | 0.0 | init | 0 | normal |